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Abstract 

In recent studies on sparse modeling, l q (0 <q< 1) regularized least squares regression (l q LS) has 
received considerable attention due to its superiorities on sparsity-inducing and bias-reduction 
over the convex counterparts. In this paper, we propose a Gauss-Seidel iterative thresholding 
algorithm (called GAITA) for solution to this problem. Different from the classical iterative 
thresholding algorithms using the Jacobi updating rule, GAITA takes advantage of the Gauss- 
Seidel rule to update the coordinate coefficients. Under a mild condition, we can justify that the 
support set and sign of an arbitrary sequence generated by GAITA will converge within finite 
iterations. This convergence property together with the Kurdyka-Lojasiewicz property of (l q LS) 
naturally yields the strong convergence of GAITA under the same condition as above, which is 
generally weaker than the condition for the convergence of the classical iterative thresholding 
algorithms. Furthermore, we demonstrate that GAITA converges to a local minimizer under 
certain additional conditions. A set of numerical experiments are provided to show the effective¬ 
ness, particularly, much faster convergence of GAITA as compared with the classical iterative 
thresholding algorithms. 
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1. Introduction 


In this paper, we consider the following l q (0 < q < 1) regularized least squares regression 
( IgLS ) problem 


(*</ LS ) mm, | t a(x’) = i ||Ax - y\\ 2 2 + A||s|||| , 


( 1 . 1 ) 


where ||x || q = I x i\ q i N is the dimension of x and A > 0 is a regularization parameter. 

The (IqhS) problem has attracted lots of attention in both scientific research and engineering 
practice, since it commonly has stronger sparsity-promoting ability and better bias-reduction 
property over the l\ case. Its typical applications include signal processing (lsj], jl3], image 
processing [ll], [2jJ, synthetic aperture radar imaging 39], and machine learning [24]. 

One of the most important class of algorithms to solve the (Z g LS) problem is the iterative 


thresholding algorithm (ITA) 


fl, [38]. 


Compared with some other classes of algorithms such as 


lems 


the reweighted least squares (IRLS) minimization [16] and iterative reweighted Z i-minimization 
(IRLlj_10i] algorithms, ITA generally has lower computational complexity for large scale prob- 


39(], which triggered avid research activities of ITA in the past decade (see 


17 , 138 ], 


40]). 


The makeup of ITA comprises two steps: a gradient descent-type iteration for the least squares 
and a thresholding operator. To be detailed, for an arbitrary p > 0, the thresholding function 
(or proximity operator) for {l q LS) can be defined as 


Prox ^\\\-\\ q S x ) = ar e min w 


x — u\ 




+ MH\ q q 


( 1 . 2 ) 


Since || ■ \\g is separable, computing Prox^ ^W-W^ can be reduced to solve several one-dimensional 
minimization problems, that is, 

|2 


prox u \\.\q\z) = argnnn 
1 1 t)€R 


\Z — V\ 


2p 


+ X\v\ 


and thus, 


Prox ii,\\\-\\«( x ) = (p rox n,M-\ q ( x i)>’” iP rox n,\\-\<i{ x N)) T - 


(1.3) 


(1.4) 


For some q, such as ^ or |, prox^\ |.p(-) can be analytically expressed 38]]. While for other 
q G (0,1), we can use an iterative scheme proposed by [26)] to compute the operator prox^^^q^). 
All these make the thresholding operator achievable. Then, an efficient gradient-descent iteration 
for the un-regularized least squares problem (A = 0 in (l q LS)) together with the aforementioned 
thresholding operator can derive a feasible scheme to solve (l q LS). 
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1.1. Jacobi iteration and Gauss-Seidel iteration 


As the thresholding operator depends only on q , the convergence of ITA depends heavily on 
the attributions of the gradient-descent type iteration. Landweber-type iteration, is a natural 
selection to solve the un-regularized least squares problems, since its feasibility has been suffi¬ 
ciently verified in many literatures (say, 22]). In the classical ITA 8.I171,1381], a Jacobi iteration 
strategy whose Landweber iteration rule is imposed on the variable x n , is employed to derive the 
estimate. We denote such algorithm as JAITA henceforth. More specially, JAITA for (Z g LS) 
can be described as: 


n+1 


(1.5) 


€ Prox^ Ml . rq {x n - yA ( Ax n -y )), 
where y > 0 is a step size parameter. 

As a cousin of the Jacobi scheme, the_Gauss-Seidel scheme is also widely used to build 


blocks for more complicated algorithms 


34 


, 35, 


36, 


37|. Different from the Jacobi iteration 


that updates all the components simultaneously, the Gauss-Seidel iteration is a component-wise 
scheme. Generality speaking, the Gauss-Seidel iteration is faster than the corresponding Jacobi 
iteration 


36], since it uses the latest updates at each iteration. The aim of this paper is to 
introduce the Gauss-Seidel scheme to solve (l q LS). The core construction of the detailed Gauss- 
Seidel update rule is by a concrete representation of the thresholding function, which is derived 
by the most recent work |9h 

ft 

According to |9(, proxx^ q {-) can be expressed as 


prox^ x \.\ q (z) 
for any zgR with 


I (• + ^pqsgn (-)| • | 9 ty 1 ty), for \z\ > T X n, q 
[ 0, for \z\ < T X ^q 

ty, = |t^( 2 A m(! - q))*Z, 

V\u,q = (2A^r(l -q))—«, 


( 1 . 6 ) 


(1.7) 

( 1 . 8 ) 


and the range of prox^x |.|? is {0} U [qxn,q- °°), s 9 n (') represents the sign function henceforth. 
When \z\ > Tx^ q , the relation prox^ xy\i{ z ) = {'+^P9 s 9 n {') l'l 9_1 )^ 1 (") means that prox^ u.isty) 
satishes the following equation 


v + A yq ■ sgn(v)\v\ q 1 = z. 


Now we are in a position to present the proposed algorithm by utilizing the Gauss-Seidel 
iteration. Given the current estimate x n and the step size y, at the next iteration, the z-th 
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I = 


coefficient is selected cyclically by 

N if 0 = (n + 1) mod N 

(n + 1) mod N otherwise 

We then derive a component-based update of the un-regularized least squares by 

z? = x?-pAj(Ax n -y), 


(1.9) 


( 1 . 10 ) 


which together with the thresholding operator then yields a component-based update for (l q LS) 


as 


x ■ € arg mm 

1 ugR 


zy — v\ 


+ \y\v\ q \ = prox^ x \.\ q (z™). 


( 1 . 11 ) 


It can be seen from (11.611 that prox^ x m 9 is a set-valued operator. Therefore, motivated by [261 ]. 
we select a particular single-valued operator of prox^xy^ and then update x" +1 according to 
the following scheme, 

n -\-1 r-r-( n n\ 

*i =T(z i ,x i ), 

where 

prox^ aitO 2 ?) 

S 9 n(z?)r]xn, q l(x J* / 0), if \z?\ = t Am 
and I(x” ^ 0) denotes the indicator function, that is, 

1, if xf / 0 
0, otherwise 

While the other components of x n+l are fixed, i.e., 


7X4*,*?) = 


I(*" ^ 0) = 


„n+l _ n 


( 1 . 12 ) 


for j / i. 

For the sake of brevity, we denote in the rest of paper and r/ M , g to take the place of 
and respectively. In summary, we can formulate the proposed algorithm as follows. 

Gauss-Seidel Iterative Thresholding Algorithm (GAITA) 

Initialize with x°. Choose a step size y > 0, let n := 0. 

Step 1. Calculate the index i according to (11.911 : 

Step 2. Calculate z™ according to (II. 10ft : 

Step 3. Update x™ +1 via (11.1111 and x™ +1 = x” for j ^ i\ 

Step 4. Check the terminational rule. If yes, stop; 

otherwise, let n := n + 1, go to Step 1. 
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1.2. Why Gauss-Seidel? 

It can be found in the last section that the main difference between JAITA and GAITA 
lies at whether the Landweber iteration is component-wise. Such a slight difference leads to a 
plausible assertion that the convergence of both algorithms are similar. To verify the authenticity 
of the above viewpoint, we conduct a set of experiments to the convergence of JAITA and 
GAITA. Interestingly, we find in this experiment that the convergence of the aforementioned 
two algorithms are totally different. 

To be detailed, given a sparse signal x with dimension N = 500 and sparsity k* = 15, we 
considered the signal recovery problem through the observation y = Ax, where the original 
sparse signal x was generated randomly according to the standard Gaussian distribution, and 
A was of dimension m x N = 250 x 500 with Gaussian A?(0,1/250) i.i.d. entries and was 
preprocessed via column-normalization, i.e., 1111 2 = 1 for any i. We then applied GAITA and 
JAITA to the (l q LS) problem with two different q, that is, q = 1/2 and 2/3, respectively. In 
both cases, the thresholding functions can be analytically expressed as shown in 


and [ll|, 


respectively, and thus the corresponding algorithms can be efficiently implemented. In both 
cases, we set A = 0.001, y = 0.95 for both JAITA and GAITA. Moreover, the initial guesses 
were taken as 0 for all cases. The trends of the objective sequences in different cases are shown 

in Fig. HJ 

From Fig. |T] the objective sequences of JAITA diverge for both q = 1/2 and 2/3, while those 
of GAITA are definitely convergent. This means that there exists some y such that JAITA 
is divergent but GAITA is assuredly convergent, which significantly stimulates our research 
interests, since a large scope of y to guarantee the convergence essentially enlarges the applicable 
range of iterative thresholding-type algorithms. 

We then naturally turn to theoretically verify the interesting phenomenon shown by FigJT} 
That is, the aim of our study is to answer the following questions: 

(Ql) Is the convergence condition of GAITA exactly weaker than that of JAITA? 

(Q2) If the answer of the above question is positive, then what is the applicable range of y for 
GAITA to guarantee the convergence? 


1.3. Related Literatures 

There are many methods used to solve the (l q LS) problem. Some general methods such as 
those 


in 0 s, 0, 0 , H, [l4, If], 191 and references therein and also books [5|, I29J do not update 
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- GAITA(q=1/2) - 

GAITA(q=2/3) 


1000 1500 

Number of Iteration n 


• JAITA(q=1/2) 
■ JAITA(q=2/3) 


1000 1500 

Number of Iteration n 


(a) Convergence of GAITA 


(b) Divergence of JAITA 


Figure 1: An experiment that motivates the use of the Gauss-Seidel scheme, (a) The trends of 
the objective function sequences, i.e., {T\(x n )} of GAITA for different q. (b) The trends of the 
objective function sequences of JAITA for different q. 

the iterations by using the Gauss-Seidel scheme. In Q], the subsequential convergence of the 
iterative thresholding algorithm for (Z g LS) with an arbitrary q € (0,1) and further the global 
convergence for (Z g LS) with a rational q have been verified under the condition 0 < /r < ||v4||^ 2 . 
In [l|, the global convergence of the iterative thresholding algorithm for (l q LS) with an arbitrary 
q has been justified under the same condition. Besides these general methods, there are several 
specific iterative thresholding algorithms for solving (Z g LS) with a specific q such as hard for Iq 
[8], soft for 1 1 Id and half for li/ 2 jsgj]. Under the same condition, all these specific iterative 
thresholding algorithms converge to a stationary point. 

Another tightly related class of algorithms is the block coordinate descent (BCD) algorithm. 
BCD has been numerously used in many applications. Its original form, block coordinate mini¬ 
mization (BCM) can date back to the 1950’s 2lf]. The main idea of BCM is to update a block 
by minimizing the original objective with respect to that block. Its convergence was extensively 
studied under many different cases (cf. 20], {31 I] mL h and the references therein). In 
251 ]. the convergence rate of BCM was developed under the strong convexity assumption for 
the multi-block case, and in 4|, its convergence rate was established under the general con¬ 
vexity assumption for the two-block case. Besides BCM, the block coordinate gradient descent 
(BCGD) method was also largely studied (cf. [35]). Different from BCM, BCGD updates a 
block via taking a block gradient step, which is equivalent to minimizing a certain prox-linear 
approximation of the objective. Its global convergence was justified under the assumptions of 
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the so-called local Lipschitzian error bound and the convexity of the non-differentiable part of 
the objective. In 28J, a randomized block coordinate descent (RBCD) method was proposed. 


281 ]. [321 ]. while there is no 


RBCD randomly chooses the block to update with positive probability at each iteration and 
is not essentially cyclic. The weak convergence was established in 
strong convergence result for RBCD. 

One important subclass of BCD is the cyclic coordinate descent (CCD) algorithm. The 
CCD algorithm updates the iterations by the cyclic coordinate updatin g ru le. The work 
used cyclic updates of a fixed order and supposes block-wise convexity. In [271], a CCD algorithm 
was proposed for a class of non-convex penalized least squares problems. However, both H 
and [34] did not consider the CCD algorithm for the (l q LS) problem. In [is|, a CCD algorithm 
emented for solving the (RLS) problem. Its convergence can be shown by referring to 


was imp 
Q. In 


33], the /qLS-CD algorithm was proposed for the (ZqLS) problem, and its convergence 


to a local minimizer was also shown under certain conditions. Recently, Marjanovic and Solo 
Q proposed a cyclic descent algorithm (called l q CD) for the (Z g LS) problem with 0 < q < 1 
and A being column-normalized, i.e., 1111 2 = 1, * = 1,2,..., iV, where Ai is the z-th column of 
A. They proved the subsequential convergence and further the convergence to a local minimizer 
under the so-called scalable restricted isometry property (SRIP) in |26[. In the perspective of 
the iterative form, ^CD is a special case of GAITA with A being column-normalized and /z = 1. 


1.4. Contributions 

The main contribution of this paper is to present the convergence analysis of GAITA for 
solving the (l q LS) problem. The finite step convergence of the support set and sign can be 
verified under the condition that the step size n is less than - J . , (see Theorem 13.71) . It 

^ maxj \\AiW2 - 

means that the support sets and signs of the sequence {x 11 } generated by GAITA certainly 
converge and remain the same within the finite iterations. Such property is very important 
since it can bring a possible way to construct an auxiliary sequence, which lies in a special 
subspace and has the same convergence behavior of the original sequence {x n }. Then with the 
help of the Kurdyka-Lojasiewicz property (See Appendix G) of T\, we can verify the global 
convergence of GAITA under the same condition, i.e., 0 < /z < — — j| ^ |p (See Theorem 13.101) . 
ft can be noted that this condition is generally weaker than that of JAITA (i.e., 0 < /z < ||^4.||^" 2 ) 
1]. This gives positive answers to question (Ql) and (Q2). The improvement on the convergence 
condition is commonly very important. It may improve not only the rate-of-convergence but 
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also the applicability of GAITA as compared with JAITA. Furthermore, we can also justify that 
the proposed algorithm converges to a local minimize!' under certain a second-order condition 
(See Theorem 13.111) . More specifically, let x* be the limit point and I be its support set. 
Then the condition can be described as: AjA/ + A q(q — l)A(xj) is positive definite, where Aj 
represents the submatrix of A with column restricted to the index set /, xj is the subvector 
of x* restricted to /, and A (x*f) is a diagonal matrix with (|x*| 9 ~ 2 )j e / as the diagonal vector. 
Besides this condition, we also give another two sufficient conditions to guarantee that the 
limit point is a local minimizer. The effectiveness, particularly, the faster convergence and 
weaker convergence condition of GAITA than JAITA have also been demonstrated by a series 
of numerical experiments. All these results show that utilizing the Gauss-Seidel iteration in ITA 
for solving (Z g LS) is feasible and efficient. 

1.5. Organization 

The remainder of this paper is organized as follows. Some preliminaries are given in section 
2. In section 3, we give the convergence analysis of GAITA. In section 4, a series of simulations 
are implemented to demonstrate the effectiveness of the proposed algorithm. We conclude this 
paper in section 5, and present all proofs in Appendix. 


2. Preliminaries 


In this section, we present some preliminaries, which serve as the basis of the convergence 
analysis in the next section. 

With the definition of the thresholding function (11.21) . we can define a new operator G^ A ||.||9 (•) 
as 

= Prox nA\V\\ q S x ~ v aT ( Ax ~ V )) (2- 1 ) 

for any x £ R w . We denote T q as the fixed point set of the operator i.e., 

J : q = {x:x = G liiX |H|«(x)}. (2.2) 


By the definition of Prox^ ^W-W^ a type °f optimality conditions of the (l q LS) problem has 


been derived in 


26]. 


Lemma 2.1. (Theorem 3 in \2al). Given a point x*, define the support set of x* as Supp(x*) = 
{i : x* / 0}, then x* £ 5F q if and only if the following three conditions hold. 

(a) For i £ Suppfx*), |x*| > r]^ q . 
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(b) For i € Supp(x*), Af (Ax* — y) + Xqsgn(x*)\x*\ q 1 = 0. 

(c) For i <E Supp(x*) c , \Af (Ax* - y)\ < r^ q /p. 

We call x* a stationary point of the (Z 9 LS) problem henceforth if it satisfies the optimality 
conditions in Lemma 12.11 Similarly, according to the definition of the operator prox^x |.| 9 (-), 
(ESI), and the updating rule of GAITA (U.9P - ()1.12j) . we can claim that x n+1 satisfies the following 
property. 

Property 2.2. Given the current iterate x n fn € Nj, the index set i is determined via 111 ..9j) . 
then x™ +1 satisfies either 

(a) x” +1 = 0, or, 

(b) |x” +1 | > rj^q and also satisfies the following equation 

Af(Ax n+1 -y) + Xqsgnfx 1 f +l )\x n i +1 \ q ~ 1 

= (i-/ir>l.)(x”-x”+ 1 ). (2.3) 

that is, V iT\(x n+1 ) = (A — AjAi)(x'f — x^ +1 ), where VjT\(x n+1 ) represents the gradient 
of T\ with respect to the i-th coordinate at the point x n+1 . 

As shown by Propertv l2.2l the coordinate-wise gradient of T\ with respect to the i-th coordinate 
at x n+1 is not exact zero but with a relative error. This property can be easily derived from the 
definition of prox^ x\.\ q (-) and the specific iterative form of GAITA. More specifically, according 
to (11.611 and (II. IIP , it holds obviously either x” +1 = 0 or |®” +1 | > Moreover, when 

I®”" 1 " 1 1 > according to (11.31) . x™ +1 is a minimizer of the optimization problem (11.31) with 
z = zf. Therefore, x” +1 should satisfy the following optimality condition 

x^ +1 - z\ T + Xpqsgn(x 7 f +1 )\x r f +1 \ q ~ 1 = 0. (2.4) 

Plugging (11.101) into (12.4P gives 

Af (Ax n+1 — y) + Xqsgn(xf +l )\x r f +l \ q ~ l = -(fif - xf +1 ) - AfA(x n - x n+1 ). (2.5) 

I - 1 

Combining (11.121) and (12.51) implies (12.31) . 

3. Convergence Analysis 

In this section, we first show the subsequential convergence of GAITA, then prove its global 
convergence, and further justify that the algorithm can converge to a local minimizer. 
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3.1. Subsequential Convergence 


To aid the description, we show that the sequence {T\(x n )} satisfies the sufficient decrease 
f| at first. 


property 


Property 3.1. Let {x n } be a sequence generated by GAITA. Assume that 0 < fi < L max , then 

T\(x n ) - T\{x n+l ) > h- - L max )||x" - *" +1 |||, Vn € N, 

2 n 

where L max = max, 11 ^4^ 111. 


The proof of this property is presented in Appendix 15.11 From this property, we can claim 
that the objective sequence {T\(x n )} converges since it is lower bounded by 0, that is, GAITA 
is weakly convergent. Furthermore, if the initialization of the sequence is bounded, then based 
on Property 13.11 it can easily derive the following boundedness and asymptotically regular 
properties of the sequence. 

Property 3.2. Let {x 71 } be a sequence generated by GAITA with a bounded initialization. As¬ 
sume 0 < [i < -L ~ ax > then {x n } is bounded for any n G N, and 


Ei 

k=0 


\x k+1 - x k \\l < 


2 n 


1 - h L r. 


-T x (x°), 


and also 


\x n — x n+1 ||2 —>• 0, as n —>■ Too. 


The boundedness of { x n } is mainly due to the sufficient decrease property, the coercivity of 

T\ and the boundedness assumption of the initialization. While the asymptotic regular property 

is mainly due to the sufficient decrease property and the boundedness of the initialization. From 

Properties 13.11 and 13.21 we can justify the subsequential convergence of GAITA. 

Theorem 3.3. Let { x n } be a sequence generated by GAITA with a bounded initialization. As¬ 
sume that 0 < fa < L ~ ax , then the sequence {x n } has a convergent subsequence. Moreover, let 
L be the set of the limit points of {x n }, then L is closed and connected. 


The proof of this theorem is presented in Appendix 15.21 This theorem only shows the 
subsequential convergence of GAITA. Moreover, we note that C might not be a set of isolated 
points. Due to this, it becomes challenging to justify the global convergence of GAITA 41]. 


More specifically, there are still two questions on the convergence of the proposed algorithm: 


(a) When does the algorithm converge globally? Under what conditions, GAITA converges 
strongly in the sense that the whole sequence generated, regardless of the initial point, is 
convergent. 
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(b) Where does the algorithm converge? Does the algorithm converge to a global minimizer or 
more practically, a local minimizer due to the non-convexity of the optimization problem? 


3.2. Global Convergence 

In this subsection, we will focus on answering the first question proposed in the end of the 
last subsection. More specifically, we will show that the whole sequence {x n } generated by 
GAITA converges as long as the step size pL E (0, L“^ x ). 

Given the current iteration x 11 , we define the descent function as 


A(x n , x n+1 ) = T x (x n ) -T x (x n+1 ). 


(3.1) 


Note that x n and x n+1 differ only in their i -th coefficient which is determined by (11.91) . From 
now on, if not stated, it is assumed that x™ +1 is given by (11.111) and i is given by (11.91) . The 
following lemma presents an important property of the descent function. 

Lemma 3.4. Let {x n } be a sequence generated by GAITA. Assume that 0 < p, < L^ ax , then 

A(x n ,x n+1 ) = 0 if and only if x™ +1 = xf. 


The proof of this lemma is obvious. On one hand, if x™ +1 = x ", then x n+l = x n , and thus 
A(x n ,x n+1 ) = 0. On the other hand, if A(x n ,x n+1 ) = 0, then Property 13.11 implies x n+1 = x 11 
and thus, x" +1 = x”. 

Moreover, similar to Theorem 10 in 26], we can claim that the mapping is a closed 


mapping, shown as follows. 


Lemma 3.5. T(-,-) is a closed mapping, i.e., assume 

(a) xf —>• x* as n —>• oo; 

(b) x™ +1 —»■ x** as n —>• oo, where x” +1 = T(z”,x”). 
Then x** = T(z*,x*), where z* = x* - p,Aj(Ax* - y ). 


The proof is the essentially the same as that of Theorem 10 in 


261 ]. The only difference 


is that prox^x IP is discontinuous at while prox i,A|-p is discontinuous at Ti >q . Therefore, 
the closedness of the operator T(-,-) can not be changed after introducing a stepsize (jl. The 
following theorem shows that any limit point of the sequence {x n } is a stationary point of the 
( IgLS ) problem. 


Theorem 3.6. Let {x n } be a sequence generated by GAITA with a bounded initialization, and 
T be its limit point set. Suppose that 0 < p, < ^ x; then C C T q . 
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261 ]. For the completion, we 


The proof of this theorem is similar to that of Theorem 5 in 
provide the proof in Appendix 15.31 

In the following theorem, we justify the finite step convergence of the support sets and signs 
of the sequence {x n }, that is, the support sets and signs of {x n } will converge and remain the 
same within a finite iterations. 

Theorem 3.7. Let {x n } be a sequence generated by GAITA with a bounded initialization. As¬ 
sume that 0 < fi < L ~ a V and x* is any limit point of {x n }, then there exists a sufficiently large 
positive integer n* > N such that when n > n*, it holds 

(a) either xf = 0 or \xf\ > r/ /i:9 for j = 1,2,--- , N; 

(b) I n = I; 

(c) sgn(x n ) = sgn(x*), 

where I n = Supp(x n ) = [i : |®”| / 0, i = 1,2 • • • , N} and I = Supp(x*). 


The proof of this theorem is shown in Appendix 15.41 Form this theorem, it can be observed 
that when n is sufficiently large, the generated sequence {x n } as well as its limit points will 
lie in the same subspace S C R N , which has some special structure. Due to this, it brings 
a possible way to construct an auxiliary sequence that has the same convergence behavior of 
the original sequence {x 11 }. Thus, we only need to verify the convergence of the constructed 
auxiliary sequence instead of {x n }. The construction of the auxiliary sequence is a bit standard 


and is motivated by 
following procedure. 


411 ]. To be detailed, the sequence can be constructed according to the 


(a) Let no = joN > n* for some positive integer jo. Then we can define a new sequence { x n } 
with x n = x no+n for n € N. It is obvious that {x 11 } has the same convergence behavior 
with {x n }. Moreover, it can be noted from Theorem 13.71 that all the support sets and signs 
of {x 11 } are the same. 

(b) Denote I as the convergent support set of the sequence {x n }. Let K be the number of 
elements of I. Without loss of generality, we assume 


1 < 1(1) < 1(2) < ■ ■ ■ < I(K) < N. 

According to the updating rule (jl.9|) - (|1.12j) of GAITA, we can observe that many successive 
iterations of {x n } are the same. Thus, we can merge these successive iterations into a 
single iteration. Moreover, the updating rule of the index is cyclic and thus periodic. As 
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a consequence, the merging procedure can be repeated periodically. Formally, we consider 
such a periodic subsequence with iV-length of {x n }, i.e., 

f£jN+I(l),£jN+I(l)+l, . . . ^jN+I(l)+N-ly 

for j G N. Then for any j € N, we emerge the A r -length sequence {x^ N+I ^\ ■ ■ ■ , j 

into a new 77-length sequence {x^ K+1 ,x^ K+2 , ■ ■ • , P K+K } with the rule 

{ x^ N+I ^\ ■ ■ ■ , x- ?Ar+7 ^ +1 ) -1 } i_^ x^ K+i , 

with xi K+l = xi N + I ( l ) for i = 1,2,--- , K, since x^ N+I ^ +k = x^ N+I ^ for k = 1, • • • , /(?' + 

1) — /(*) — 1. Moreover, we emerge the first 7(1) iterations of {x n } into x°, i.e., 

{X°, • • • , X^- 1 } X°, 

with x° = x°, since these iterations keep invariant and are equal to x°. After this proce¬ 
dure, we obtain a new sequence {x n } with n = jK + i, i = 0, • • • , K — 1 and j G N. It 
can be observed that such an emerging procedure keeps the convergence behavior of {x 11 } 
the same as that of {x n } and {x n }. 

(c) Furthermore, for the index set 7, we define a projection Pj as 

Pi : R N -> K K , P/x = x 7 , Vx G R W , 

where x/ represents the subvector of x restricted to the index set 7. With this projection, 
a new sequence {u n } is constructed such that 

u n = P 7 x n , 

for n G N. As we can observe that u n keeps all the non-zero elements of x n while gets rid 
of its zero elements. Moreover, this operation can not change the convergence behavior of 
{x n } and {u 11 }. Therefore, the convergence behavior of {u 11 } is the same as {x n }. 

After the construction procedure (a)-(c), we get a new sequence {u n }. In the following, we 
will prove the convergence of {x n } via justifying the convergence of {u n }. Let 

U = {u* :u* = P/x*, Vx* G £}. 

Then U is the corresponding limit point set of {u 11 }. Furthermore, we define a new function T 
as follows: 

T : K k —>■ R ,T(u) = T\(Pju),Mu G R A , (3.2) 
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where Pj denotes the transpose of the projection Pj. and is defined as 


Pj : R A -> K n , (Pfu)r = u, (Pf u ) /c = 0, Vn € K k . 


Here I c represents the complementary set of /, i.e., I c = {1, 2, • • • , N} \ /, ( Pfu)i and {Pju)jc 
represent the subvectors of Pju restricted to I and I c , respectively. Let B = Aj, where Aj 
denotes the submatrix of A restricted to the index set /. Thus, 

U») = f||B»-!,||l + A||u|||. 


After the construction procedure (a)-(c), we can observe that the following properties still 
hold for {u n }. 


Lemma 3.8. The sequence {u n } possesses the following properties: 

(a) {u n } is updated via the following cyclic rule. Given the current iteration u n , only the i-th 
coordinate will be updated while the other coordinate coefficients will be fixed at the next 
iteration, i.e., 



< +1 =T(t>”,u”), 

(3.3) 

and 

up 1 = u], for 

(3.4) 

where i is determined by 




K if 0 = (n + 1) mod K 

(n + 1) mod K, otherwise 

(3.5) 

and 

vf = uf-pBj(Bu n -y), 

(3.6) 


(b) According to the updating rules 6\) . for n > K, there exit two positive integers 

1 < *o < K and jo > 1 such that n = joK + zq and 


-1 = 


n-(i 0 -j) 

j ’ 

n— K—(in — n) 


if 1 < j < io 


(3.7) 


(c) For any n € N, 

u n e R5 c, 

where R v c represents a one-dimensional real subspace, which is defined as 


^v^.q c — ^ \ ( u,q)- 

(d) Given u n , if i is determined by \S.5\) . then u'f +1 satisfies the following equation 

Bj{Bu n+1 -y) + A^n« +1 )|< +1 | 9 - 1 
= (i-RfR i )«-< +1 )- (3-8) 

/i 
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That is 


V.TK+ 1 ) = (i - Bf BiKu? - < +1 ), 

M 

where ViT(u n+1 ) represents the gradient ofT(-) with respect to the i-th coordinate at the 
point u n+1 . 

(e) {u n } satisfies the following sufficient decrease condition: 

T{u n ) - T(u n+1 ) > a||u n - u n+1 \\l 

for n € N, w/iere a = - L max ). 

(f) ||« n+1 — u n H 2 —>• 0, as n-> 00 . 


It can be observed that the properties of {u n } listed in Lemma [3.81 are some direct extensions 
of those of {x 11 }. More specifically, Lemma ETSlf a) can be derived by updating rules (11.9D - (jl.l21) 
and the construction procedure. Lemma [3TH^ b) is obtained directly by the cyclic updating rule. 
Lemma [3.81 cl and (d) can be derived by Property I2.2l bl and the updating rules (I3.3D - (I3.6I) . 
Lemma 13.81( e) can be obtained by Property 13. II and the definition of T 113.211 . Lemma 13.8( f) can 
be directly derived by Property 13.21 Besides Lemma 13.81 the following lemma shows that the 
gradient sequence {VT(u n )} satisfies the so-called relative error condition [l|, which is critical 
to the justification of the convergence of {u k }. 


Lemma 3.9. When n > K — 1, VT(tt n+1 ) satisfies 

||VT(u” +1 )|| 2 < b\\u n+1 — u”|| 2 , 


where b = + K5)VK , with 


5 = max \Bj Bj 

i,j=l,2,— ,K J 


The proof of this lemma is given in Appendix 15.51 From Lemma 13.81 (e), the sequence {u n } 
satisfies the sufficient decrease condition with respect to T, and by Lemma ETUI {u n } satisfies the 


relative error condition, and also by t 
condition. Furthermore, according to 


le continuity of T, {u n } satisfies the so-called continuity 


Q 


(p. 122), we know that the function 


T (u) = 2 \\ Bu -y\\l + MM q q 


is a Kurdyka-Lojasiewicz (KL) function (see Appendix 15.71 ). Thus, according to Theorem 
2.9 in [l|, { u n } is convergent. As a consequence, we can claim the convergence of {x n } as shown 
in the following theorem. 


Theorem 3.10. Let {x n } be a sequence generated by GAITA with a bounded initialization. 
Assume that 0 < p < L ~* Y , then { x 11 } converges to a stationary point. 
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According to 
1-2 


Q. 


the convergence condition of JAITA when applied to the (l q LS) problem is 
0 < p < ||A||^" Z . It can be noted that max,: ||Aj ||2 < ||^4||§, and hence the condition in Theorem 
13.101 is generally weaker than that of JAITA. Moreover, as shown by Fig. |T] such improvement 
on the convergence condition is solid and essential in the sense that there exists a step size 
p € (|| A \\2 2 , L~ ax ) such that JAITA certainly diverges while GAITA definitely converges with 
this given step size. 

Suppose that A is column-normalized, i.e., ||Aj ||2 = 1 for any i, then L max = 1, and thus the 
condition of GAITA becomes 0 < p < 1. In this setting, if further /i = 1, then GAITA reduces to 
the IqCY) algorithm [2f| in the perspective of the iterative^form. However, only the subsequential 


26] if there is no additional requirement 


convergence of the l q CD algorithm can be claimed in 
of A. Compared with the l q CD algorithm, there are mainly two significant improvements. The 
first one is that we extend the column-normalized A to a general A. Such extension on the 
model can improve the flexibility and applicability of GAITA. The second one, and also the 
more important one is that the global convergence of GAITA can be established. It gives a 
solidly theoretical guarantee to the use of GAITA. 


3.3. Convergence to A Local Minimizer 

In this subsection, we mainly answer the second open question proposed in the end of the 
subsection 3.1. More specifically, we will justify that GAITA converges to a local minimizer of 
the (IqLS) problem under certain conditions. 

Theorem 3.11. Let {x n } be a sequence generated by GAITA with a bounded initialization. 
Assume that 0 < p < L ~ ax , and x* is the convergent point of {x n }. Let I = Supp(x*), and 
K = ||s*|| 0 . Then x* is a (strictly) local minimizer of T\ if the following condition holds: 

A$A I + \q(q-l)h.{x* I ) y 0, (3.9) 

where Aj represents the submatrix of A with column restricted to I, x* q is the subvector of x 
restricted to I, A(xJ) € R AxA: is a diagonal matrix with (\x*\ q ~ 2 )i e / as the diagonal vector, and 
My 0 represents that M is positive definite for any matrix M. 


The proof of this theorem is given in Appendix 15.61 Intuitively, under the condition of 
Theorem 13. Ill it follows that the principle submatrix of the Henssian matrix of T\ at x* restricted 
to the index set I is positive definite. Moreover, by Lemma 12.11 (b), the following first-order 
optimality condition holds 

Aj (Ax* -y) + A0i(xj) = 0, 
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where = (qsgn(x* i )\x* i \ q ~ 1 , ■ ■ ■ ,qsgn(x* K )\x\ K \ q ~ 1 ) T , and ij € I,j = l,--- ,K. These 

two conditions imply that the second-order optimality conditions hold at x* = [x* T . 0). For any 
sufficiently small h, let x h = x* + h, then 

T\(x h ) = -\\Ajx 1 } — y + Ajchjc ||| + A^^ \x^\ q + A \hi\ q 

iel i€l c 

= jll^?-!/llI +A ]>>?!’ 

iel 

+ ~z ||^4/ c /i/c ||2 + (hjc, AjclAjx 1 } — y)) + A \hi\ q . 

iei c 

Denote T/c = i||A/c/i/c||| + (hjc , Aj c (Aix 1 } — y)) + A^T gJC \hi\ q . Then 
T\(x h ) > T\(x*) + Tjc 

> T\(x*) + — ||j4/c/i/c||| + ^(A|/ij| g — \\AJ c (Ajx 1 } — y)||oo|^h|)j 

iei c 

where the first inequality holds for the optimality at x* = (xj,0) and thus, ||| Ajx 1 } — y\\\ + 
\ x i\ q — T\(x*). It can be observed that if hjc is sufficiently small, then the last part of 
the above inequality should be nonnegative. Therefore, x* should be a local minimizer. 

Furthermore, we can drive another two sufficient conditions via taking advantage of the 
specific form of the threshold value (11.81) . Let e = mirij g / |ar*|. Note that 

Amin (AjAj + A q(q — l)A(xJ)) > Knm{AjAj) + A q(q — l)e g 2 , 

where A m i n (M) represents the minimal eigenvalue of a given matrix M. Thus, if 

A miniAjAj) > 0 and 0 < A < Amin( ^ Al ^ e 9 (3.10) 

<?(1 - Q ) 

then the condition of Theorem 13.111 holds naturally. 

Moreover, by (II.8|) . it holds 

e > rj^q = (2A/x(l - q))^. (3.11) 

H <™*. if iisw > 1 “d 

2A min (Af A]) < 11 < max, ||Alir ^ 12 ^ 

then the condition (13.101) holds and thus (13.91) also holds. According to the above analysis, we 
can easily obtain the following theorem. 
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Theorem 3.12. Let {x n } be a sequence generated by GAITA with a bounded initialization. 
Assume that 0 < p < and x* is the convergent point of { x n }. Let I = Supp(x*), 

K = ||x*||o, and e = minj g / |x||. Then x* is a (strictly) local minimizer of T\ if either of the 
two following conditions satisfies: 

(a) A muMjA,) > 0,0 < A < ; 

i 


/ 1\ ^min (Af Ai) q q 

' ’ max; \\Ai ||| 2 ’ 2A min ( J 4}W) 


< p < 


max; || Ai 


Intuitively, the condition (a) in Theorem 13.121 means that if the smooth part of the [l q LS) 
problem is strictly convex and the regularization parameter is sufficiently small, then the con¬ 
vexity of T\ at x* can be guaranteed by the convexity of the smooth part. Suppose that A is 
column-normalized, i.e., 11^4^ 11 2 = 1 for any i, then the condition (b) in Theorem 13.121 intuitively 
implies that if the smooth part of the {l q LS) problem is strongly convex, then the local convex¬ 
ity of T\ at x* can be guaranteed as long as the step size p is chosen appropriately. Similar 
conditions are also derived for the iterative half thresholding algorithm for solution to the (l q LS) 
problem with <7 = 1/2 (See Theorems 1 and 2 in [mJ). However, the conditions in this theorem 


are a litt 

In 


e weaker than those 


in Q. 


26], the convergence of the l q CD algorithm to a local minimizer is justified under a certain 


scalable restricted isometry property (SRIP). SRIP is defined as follows. 

Definition 3.13. (SRIP 0). We say A has the SRIP(p,4>,a) if there exist v</>,'y</> > 0 satis¬ 
fying < a such that 

< \\Ax\\ 2 < 7(/>||x|| 2 

holds for every x £ B p (<f>) '■= {x : ||x||p < f>}, and || • \\ p := || • ||o for p = 0. 


Roughly speaking, and 7 ^ can be viewed as some type of the minimal and maximal singular 
values of A, respectively. Thus, SRIP essentially indicates that A possesses a good condition 


number. With the definition of SRIP, |26[ demonstrates that if A has the SRIP(p, <f, a) with 
some p > 0, then for any 0 < q < q* (where q* := min{l, 2/a 2 }), the l q CD algorithm converges 
to a local minimizer. Particularly, when a = y/2, that is, 70/^0 < \/2, then the l q CD algorithm 
converges to a local minimizer for any 0 < q < 1. In other words, if 


2vT 

0 < q < min{ 1 , -S-}, 

4 


(3.13) 


then the l q CD algorithm can converge to a local minimizer. It can be seen from Theorem 13.121 
that the condition (b) is equivalent to 


0 < q < minjl, 


2A m i n (j4jH/) 

max* \\Ai\\\ 


}■ 


(3.14) 
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It is generally hard to compare the conditions (13.131) and (13.141) directly. However, if p = 0, 
then SRIP may reduce to the standard restricted isometry property (RIP), and in this case, if 
further f = K (where I\ is the cardinality of the support set of x*), then 

Amin (AjAi) > i? K and max || A:III < Ik- 

l 

Therefore, 

A m in(-4fA) ^ 4 
maxj || AH 2 Ik 

which implies that our conditions for convergence to a local minimizer are generally weaker than 
that of the l q CD algorithm in terms of the SRIP. 


4. Numerical Experiments 


In this section, we demonstrate the effects of the algorithmic parameters on the performance 
of GAITA. Particularly, we will mainly focus on the effect of the ste p si ze parameter, while the 
effects of the regularization parameter A and q can be referred to 26|. Moreover, a series of 


experiments are conducted to show the faster convergence as well as the weaker convergence 
condition of GAITA as compared with JAITA. 


4-1■ On effect of y 

For this purpose, we considered the performance of GAITA for the sparse signal recovery 
problem, i.e., y = Ax + e, where x E R iV was an unknown sparse signal, A E J\J nxN was the 
measurement matrix, y E R m was the corresponding measurement vector, e was the noise and 
generally m < N . The aim of this problem was to recover the sparse signal x from y. In 
these experiments, we set m = 250, N = 500 and k* = 15, where k* was the sparsity level of 
the original sparse signal. The original sparse signal x* was generated randomly according to 
the standard Gaussian distribution. A was of dimension m x N = 250 x 500 with Gaussian 
JV(0, 1/250) i.i.d. entries and was preprocessed via column-normalization, i.e., ||A||2 = 1 for 
any i. The observation y was added with 30 dB noise. With these settings, the convergence 
condition of GAITA becomes 0 < y < 1. To justify the effect of the step size, we varied y 
from 0 to 1, and considered different q, that is, q = 0.1,0.3, 0.5,0.7,0.9. The terminal rule of 
GAITA was set as the recovery mean square error (RMSE) l ess than a given precision 

tol (in this case, tol = 10^ 2 ). The regularization parameter A was set as 0.009 and fixed for all 
experiments. The experiment results are shown in Fig. [2j 
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(a) Recovery Error (b) Computational Time 

Figure 2: Experiment for the justification of the effect of the step size parameter y on the 
performance of GAITA with different q. (a) The trends of recovery error of GAITA with 
different q. (b) The trends of the computational time of GAITA with different q. 

From Fig. [2l we can observe that the step size parameter y has almost no influence on 
the recovery quality of the proposed algorithm (as shown in Fig. [2](a)) while it significantly 
affects the time efficiency of the proposed algorithm (as shown in Fig. EJb)). Basically, we can 
claim that the larger step size implies the faster convergence. This coincides with the common 
consensus. Therefore, in practice, we suggest a larger step size like 0.95/L max for GAITA. 


4-2. Comparison with JAITA 
4-2.1. Faster Convergence 

We conducted an experiment to demonstrate the faster convergence of GAITA as compared 
with JAITA 38], 41]. For this purpose, given a sparse signal x with dimension N = 500 and 


sparsity k* = 15, shown as in Fig. [3](b), we considered the signal recovery problem through 
the observation y = Ax, where the measurement matrix A and the original sparse signal x 
were generated according to the same way in section 4.1. We then applied GAITA and JAITA 
to the (l q LS) problem with two different q, that is, q = 1/2 and 2/3, respectively. In both 


cases, we took A = 0.001, y = 


0.95 


-p(= 0.95) for GAITA and y = 0.99||A||^ 2 (= 0.1676) 


max; IIA; || 2 

for JAITA. Moreover, the initial guess was 0. For better comparison, we took every N inner 
iterations of GAITA as one iteration since in this case, all coordinates were updated only once. 
The experiment results are reported in Fig. [3l 

It can be seen from Fig. [3](a) how the iteration error (\\x n — x*^) varies. It can be observed 
that GAITA converges much more rapidly than JAITA in both cases. As shown in Fig. [3](a), 
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(a) Iteration error (b) Recovery signal 

Figure 3: Experiment for convergence rate, (a) The trend of iteration error, i.e., ||x n — ac*|| 2 - 
(b) Recovery signal. The Recovery MSEs of the four cases, that is, GAITA (q = 1/2), GAITA 
(<q = 2/3), JAITA (q = 1/2) and JAITA (q = 2/3) are 2.06 x 10~ 8 , 5.14 x 10“ 9 , 2.12 x 10" 8 and 
5.28 x 10 -9 , respectively. 


the numbers of iterations needed for GAITA are about 150 in both cases, while much more 
iterations are required for JAITA (say, about 1500 and 1700 iterations for q = 1/2 and 2/3, 


respectively). As justified in 


40], { 41 1. JAITA possesses the eventually linear convergence rate, 


that is, JAITA will converge linearly after certain iterations. From Fig. [3](a), the similar 
eventually linear convergence rate of GAITA can be observed. Also, compared with JAITA, 
much fewer iterations are required to start such a linear decay. Moreover, Fig. [3j]b) shows that 
the original sparse signal is recovered by both GAITA and JAITA with very high accuracies. 
This experiment clearly shows the faster convergence as well as eventually linear convergence 
rate properties of GAITA. 


4-2.2. Weaker Condition 

We conducted a set of experiments to demonstrate the convergence condition of GAITA 
is weaker than that of JAITA. The experiment setting was the same as the above subsection. 
We then applied GAITA and JAITA to the {l q LS) problem with q = 1/2. In this setting, the 
theoretical condition for convergence of JAITA is /r £ (0,0.1759) while the associated condition 
of GAITA is n € (0,1). We used different /i (i.e., /r = 0.4, 0.5,0.6,0.7,0.8,0.9,1) for both GAITA 
and JAITA. The figures of the objective function sequences are shown in Fig. 4. 

From Fig. 0] the objective sequences of JAITA diverge for all fi, while those of GAITA are 
certainly convergent. These can be observed detailedly from Fig. SKb) and (d), respectively. By 
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(a) Convergence of GAITA 




Number of Iteration n 


(b) Detail of GAITA 


|i = 0.4 
|x = 0.5 
|x = 0.6 

- fi = 0.7 

-H = 0.8 

- H = 0.9 
“H= 1 


8 10 12 14 16 18 20 

Number of Iteration n 


(c) Divergence of JAITA 


(d) Detail of JAITA 


Figure 4: An experiment that verifies the weaker convergence condition of GAITA as compared 
with JAITA. (a) The trends of the objective function sequences, i.e., {T\(x n )} of GAITA with 
different ju. (b) The detail trends of the objective function sequences of GAITA with different 
H- (c) The trends of the objective function sequences of JAITA with different [i. (d) The 
detail trends of the objective function sequences of JAITA with different /r. The regularization 
parameter A was taken as 0.001 in all cases. 


Fig. [Ha), the objective sequences of GAITA can converge fast within about 400 iterations in 
all cases, while those sequences of JAITA diverge rapidly as shown by Fig. |4)^d) . When /r = 1 
and A is column-normalized, GAITA is reduced to the l q CD method. Fig. 0|a) and (b) show 
the objective sequence of the l q CD method is convergent, which can be actually guaranteed by 
Propertv l3.il It implies that the l q CT> method is weakly convergent as justified in 26j. However, 


different from GAITA, the global convergence of the l q CD method has not been justified if there 
is no additional condition. 
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5. Conclusion 


In this paper, we focused on utilizing the Gauss-Seidel iteration rule to the iterative threshold¬ 
ing algorithm for the non-convex l q regularized least squares regression problem and developed 
a new algorithm called GAITA. The main contributions of this paper are the establishment of 
the convergence of the proposed algorithm. In summary, we have verified that 

(i) GAITA has the finite step convergence of the support set and sign as long as the step size 
0 < n < 1/L max . It means that the support sets and signs of the sequence generated by 
GAITA can converge and remain the same within finite iterations. 

(ii) Under the same condition, the global convergence of GAITA can be justified. Compared 
with JAITA like half algorithm for l^/ 2 regularization, the convergence condition of GAITA 
is weaker than that of JAITA (i.e., 0 < p, < ||A||^" 2 ). 

(iii) If certain a second-order condition is satisfied at the limit point, then the limit point 
can indeed be a local minimizer. Thus, under these conditions, the proposed algorithm 
converges to a local minimizer. 

(iv) Several numerical experiments are implemented to demonstrate the effectiveness of GAITA, 
particularly, the expected faster convergence and desired weaker convergence condition 
than JAITA. Also, the similar eventually linear convergence rate of GAITA can be ob¬ 
served. However, such rate of convergence property of GAITA has not been justified in 
the current paper, and we will study this in the future work. 

When it comes to parallel implementation, however, GAITA could have certain disadvantages 
because variables that depend on each other can only be updated sequentially. 

Appendix 

Most of proofs and the description of Kurdyka-Lojasiewicz inequality are presented in Ap¬ 
pendix. 

5.1. Proof of Property I3.il 

Proof. Given the current iteration x n , let the coefficient index i be determined according 
to CLHD . According to (jl.31) and (jl.llli , 

€ argmin / ^ ^ —|- A/r|'u| (? l , 

■ueR I 2 i 
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where zf = xf — yAj ( Ax n — y). Then it implies 


\\tiJ${Ax n - y )| 2 + > \\W +1 - x ?) + yAj(Ax n - y )| 2 + XyK +1 \T 


Some simplifications give 

A|x?| 9 - A|x! 


|~"+1 _ ~ra|2 
n+l|ij ffi _ I 


+ A?(Ax n -y)(x? +L -x?) 


2y 


Moreover, since x™ +1 = xf for any j i, becomes 


A||x”||»-A||s” +I H> 


Ln+1 _ x n 112 


2y 


+ (Ax n -y,A{x n+1 -x n )). 


Adding l||Ax n — y\\\ — ^||Ax n+1 — y\\\ to both sides of (15.21) gives 

T\(x n ) — T\{x n+1 ) 

||™n+l _ ™n||2 i 

> 11 2y 11 - - * n+i m 

||™n+l _ ™n||2 I 

= ll ' 2fi 11 - 2^)IK - ^ n+1 Hi 

>^(--^ m ax)||s n -* n+1 ||l, 

2 y 

where the first equality holds for 


(5.1) 


(5.2) 


(5.3) 


II A(x n - x n+1 )||| = (Aj Ai) |s? - x? + r = (M'Ai)\\x n - x n+1 \\ z 2 


n 112 


n nH-lii2 


and the second inequality holds for Af Ai < L n 


5.2. Proof of Theorem \3.3\ 

Proof. By Propertv l3.il we know that {T\(x n )} is a decreasing and lower-bounded sequence, 
thus, {T\(x n )} is convergent. Denote the convergent value of {T\(x n )} as T*. Moreover, by 
Property 13.21 {x n } is bounded, and also by the continuity of T\(-), there exists a subsequence 
of {x n }, {x n i } converging to some point x*, which satisfies T\(x*) = T*. 


Furthermore, by Property 13.21 and Ostrowski’s result (Theorem 26.1, p. 173) 
point set C of the sequence {x 71 } is closed and connected. ■ 


30], the limit 


5.3. Proof of Theorem 13.61 

Proof. Since the sequence {x n } is bounded, then it has limit points. Let x* E C. We now 
focus on the i-th coefficient of the sequence with n = n{i) = jN + i — 1, where i = 1,2,..., N 
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and j = 0,1,... . However, here, we simply use n by which we mean n(i). Now there exists a 
subsequence {x ni , x n2 , • • • } such that 

{x ni , x 712 , ■ ■ ■ } —» x* and { x 771 , x 772 , • • • } —> x*. (5.4) 

Moreover, since the sequence {x ni+1 , x n2+l , ■ • • } is also bounded, thus, it also has limit points. 
Denoting one of these by x**, then there exists a subsequence {x ll+1 , x l2+1 , • • • } such that 

{x ll+1 , x l2+1 , •••}—> x** and {x* 1+1 , x l ? +1 , x **, (5.5) 

where {Zi, I 2 , ■ ■ ■ } C {ni, n, 2 , ■ ■ ■ }. In this case, it holds 

{x l1 ,x l2 , ■ ■ ■ } —» x* and {x 1 ^. x 1 ' 2 , • • • } —» x*, (5.6) 

since it is a subsequence of (15.41) . From (11.101) and (15.61) . we have 

z\ 3 —y z* as j 00 . 


Thus, by Lemma 13.51 it holds 


x** = T(z*,x*). 


(5.7) 


Moreover, by (15.51) . (15.61) and (11.121) . it holds 


x* = x** for j 7 ^ i. 


(5.8) 


In the following, by the continuity of T\(-) and thus the continuity of A(-, •) with respect to 
its arguments, it holds 

A(x lj ,x lj+1 ) -> A(x*,x**). 

Moreover, since the sequence {T\(x n )} is convergent, then 

A(x lj ,x lj+1 ) = T\(x lj ) — T\(x lj+1 ) -4- 0 as j —» 00 , 


which implies 


A(x*,x**) = 0 . 


Furthermore, by Lemma 13.41 and (I5.7I) - (|5.8I) . it holds 


Combining (15.71) and (15.91) . we have 


= T(z* 


(5.9) 


(5.10) 
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Since i is arbitrary, we have that (15.101) holds for all i G {1, ■ ■ • ,N}. It implies that x* is a 
fixed point of that is, x* € T q . Similarly, since x* € C is also arbitrary, therefore, 

£ C F q . Consequently, we complete the proof of this theorem. ■ 


5-4- Proof of Theorem 3.1 


Proof. We can note that all the coefficient indices will be updated at least one time when 
n > N. By Property 12.21 once the index i is updated at the n-th iteration, then the coefficient 
xf satisfies: 

either xf = 0 or |x"| > 77 ^. 


Thus, Theorem 13.7f a) holds. 

In the following, we prove Theorem 13.7f b) and (c). By the assumption of Theorem 13.71 there 
exits a subsequence {x n i} converges to x*, i.e., 


x nj —»■ x* as j —>• 00 . (5-11) 

Thus, there exists a sufficiently large positive integer jo such that \\x nj — x *\\2 < g^, q when 
3 > jo- Moreover, by Property 13.21 there also exists a sufficiently large positive integer n* > N 
such that \\x n — x n+1 ||2 < 77 ^ when n > n*. Without loss of generality, we let n* = rij 0 . In the 
following, we first prove that I n = I and sgn(x n ) = sgn{x*) whenever n > n*. 

In order to prove I n = I, we first show that I nj = I when j > jo and then verify that 
I n+1 = I n when n > n*. We now prove by contradiction that I nj = I whenever j > jo- Assume 
this is not the case, namely, that I nj 7 ^ I. Then we easily derive a contradiction through 
distinguishing the following two possible cases: 

Case 1: I nj ^ I and n/ C I nj ■ In this case, then there exists an i nj such that i nj € I nj \I. 
By Theorem I3.7l a). it then implies 

\\X 3 -X 2 > \xP,\ > mm \xP\> r)^ q , 

3 iei 3 

which contradicts to \\x nj — x *\\2 < rjn, q - 

Case 2: I nj / I and I nj D I = I nj ■ In this case, it is obvious that I n 3 C I. Thus, there exists 
an i* such that i* € I \ I nj ■ By Lemma 12.11 a), we still have 

II Tbn *11 \ * l*l\ 

\\x 3 — x ||2 > |Xj* | > nun \x i \ > rj^q, 
iei 

and it contradicts to \\x nj — x *||2 < ?7 
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Thus we have justified that I nj = I when j > jo. Similarly, it can be also claimed that 
I n+1 = I n whenever n > n*. Therefore, whenever n > n*, it holds I n = I. 

As I n = I when n > n*, it suffices to test that sgn(x ,•”'*) = sgn(x*) for any i € I. Similar to 
the first part of proof, we will first check that sgn(x ” J ) = sgn(x*), and then sgn(x™ +1 ) = sgn(x 
for any i € / by contradiction. We now prove sgn(x™ j ) = sgn(x*) for any i € I■ Assume this is 
not the case. Then there exists an i* £ I such that sgn(x™*) / sgn(x **), and hence, 

sgn(x™*)sgn(x**) = — 1 . 

From Lemma l2.1f al and Theorem 13.7( a). it then implies 

II Tin * || \ | * | | I I I * I 

\\x 3 — X ||2 > |Xj* — Xj* | = |Xj* | + \Xi* | 

> minjl^i + Kl> > 2 r l^ 

iei 

contradicting again to ||a;”^ — x *\\2 < 'H^q- This contradiction shows sgn(x nj ) = sgn(x*). 
Similarly, we can also show that sgn(x n+1 ) = sgn(x n ) whenever n > n*. Therefore, sgn(x n ) = 
sgn(x*) when n > n*. 

With this, the proof of Theorem 13.71 is completed. ■ 

5.5. Proof of Lemma 1 3. .91 

Proof. We assume that n + 1 = j*K + i* for some positive integers j* > 1 and 1 < i* < K. 
For simplicity, let 

i* = K. (5.12) 

If not, we can renumber the indices of the coordinates such that (15.121) holds while the iterative 
sequence { u n } keeps invariant, since the updating rule (13.51) is cyclic and thus periodic. Such 
an operation can be described as follows: for each n > K, by Lemma l3.8f b). we know that the 
coefficients of u n are only related to the previous K — 1 iterates. Thus, we consider the following 
a period of the original updating order, i.e., 

then we can renumber the above coordinate updating order as 

{!', ■ ■ ■ + l)',--- ,K'}, 
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with 


., I i* + j, if 1 <j< K - i* 

\ j- (K - **), if K - i* < j < K 

In the following, we will calculate VjT(u n+1 ) by a recursive way for i = K,K — 
Specifically, 

(a) For i = K, by Lemma l378lf d) . it holds 

V K T(u n+1 ) = {-- B T K B K ){u\ - u n K +1 ). 

fl 

For any i = K — 1, K — 2, • • • ,1, 

VjT(u n+1 ) = Bf (Bu n+1 -y) + Ags 5 n(< +1 )|< +1 r \ 
and u” +1 = u”. Therefore, for i = K — 1, IF — 2, • • • ,1, 

V l T(u n+l ) = ViT(u n ) + BfB K {u n + l - u n K ). 

(b) For i = K — 1, since n = j*K + (K — 1), then by Lemma l3.8l df again, it holds 

V K -lT(u n ) = (-- - <_l). 

By Lemma 13.8( b). it implies 

v n ~ l — v n+1 
U K -1 — 

Thus, 

Vr-iT(.") = (I - Kt 1 , -<_,)• 

r 

Combing (15.141) with (15.161) . 

Vl--iT(»" +l ) = (i - b£_iBa-i) W-i - <*-i) + b£_iBx« +I - <*)• 

/i 

Similarly to (15.141) . for i = K — 2, JL — 3, • • ■ , 1, we have 

ViT(u n ) = ViT(tt n_1 ) + Bf B K - 2 {u n K _ 2 - u n K -J 2 ). 


, 1 - 

(5.13) 

(5.14) 

(5.15) 

(5.16) 

(5.17) 

(5.18) 
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(c) For any i = K — j with 0 < j < K — 1, by a recursive way, we have 
VK-jT(u n+1 ) 

= \/ K -jT(u n ) + B^_ j B K (u n K +1 - u \) 

= Vir-iZXu"- 1 ) + X B K _ k (u n K + _}- k - u n K -_\) 

k =0 


J-l 


= V K-jT(u n ~ : ' +1 ) + B t k _j X B K -k( 

k =0 


z (V n+1 ~ k - U n ~ k i 
-k\U K _ k u K-k> 


Moreover, Lemma 13.8( d) gives 


VK-jT(u n ~ j+1 ) = (- - B T K _ 0 B K . 0 ,-)(«£+ - C^ 1 )- 




Plugging (15.201) into (15.191) . it holds 


v k _,t(^ + i )= i («£?,. - <- v 1 )+ x b t k _,b k _^ 


n-\-l—k n—& 




fc =0 


for j = 0,1, • • • , A' — 1. Furthermore, by Lemma 13.8( b). it implies 


f n+l —k _ 7i+l 


(5.19) 


(5.20) 


(5.21) 


and 


n-fc _ n 
u K—k ~ u K-k 


for 0 < k < K — 1. Therefore, (|5.21l) becomes 


V*r_ i T(u"+ 1 ) = - «£*■) + X BT K-j B K-k(u n +l k - <_ fe ), (5.22) 


fc =0 


for j = 0,1,- ,K -1. 

Furthermore, by (15.221) . it implies 

|V s -_,T(u" + ‘)| < 1|<_, - 

^ fc =0 


1 


< - “jff-jl + <5|K + - ^ n ||i 

for j = 0,1, • • • , K — 1, where the second inequality holds for 


(5.23) 


J = max I Bj B 


i,j= l,— A 


* An 
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and 


E I ?/” +1 —n n I < IU / n+1 _ ni n \\- { 
\ U K—k U K—k\ — ll u u 111- 


k =0 


Summing \V K-jT(u n+1 )\ with respect to j gives 


||VT(u n+1 )||i < -||u n+1 - u n ||i + K5\\u n+1 - u n ||i 

< (-+ K5)VK\\u n+1 - u n \\ 2 , 

[1 


(5.24) 


where the second inequality holds for the norm inequality between 1-norm and 2-norm, that is, 


Mb < ||it||i < Vk\\ it || 2 j 


(5.25) 


for any u € R A . Also, combining (15.251) and (15.241) implies 

||VT(u n+1 )|| 2 < (- + K6)VK\\u n+1 - u n || 2 . 
I 1 


5.6. Proof of Theorem \3.11\ 

Proof. Let F{x) = \\\Ax — y ||| and 

M x *l) = (QS 9 n(x* 1 )\x* 1 \ q ~ 1 ,--- ,qsgn(x* K )\x* K \ q ~ 1 ) T , 

where ij € I, j = 1, • • • , K. By Lemma 12.1( b) we have 

Aj (Ax* — y) + A^i(xj) = 0. (5.26) 

This together with the condition of the theorem 

Aj A r + A q(q — l)A(afj) >- 0 

imply that the second-order optimality conditions hold at x* = (xj , 0). For sufficiently small 
vector h, we denote x* h = (xj + hi, 0). It then follows 

F(x* h ) + A ^ \x* + hi | 9 > F(x*) + A J2 \x*\ q . (5.27) 

i£l i£l 

Furthermore, for any q E (0,1), it obviously holds that 

t q >(\\V I cF(x*)\\ 00 + 2)t/X, 
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for sufficiently small t > 0. By this fact and the differentiability of F, one can observe that for 
sufficiently small h, there hold 

F(x* T h ) - F(x* h ) T A ^2 \hi\« = \7j c F(x*)h IC + A |htf T o(/ijc) 

iei c iei c 

> 5^(||V/cF(s*)|| 0O - [V IC F(x*)]i T l)\hi\ > 0 . (5.28) 

*ei c 

Summing up the above two equalities (I5.27I) - (I5.28I) . one has that for all sufficiently small h , 

T x (x* + h)-T x (x*)> 0, (5.29) 

and hence x* is a local minimizer. 

Actually, we can observe that when h / 0, then at least one of the two inequalities (|5.27[) 
and ()5.28p will hold strictly, which implies that x* is a strictly local minimizer. ■ 

5.7. Kurdyka-Lojasiewicz Inequality 

(a) The function / : R —»• R U {Too} is said to have the Kurdyka-Lojasiewicz property at 
x* € dom df if there exist tj G (0, Too], a neighborhood U of x* and a continuous concave 
function ip : [0, rj) —>• R + such that: 

(i) y?( 0 ) = 0 ; 

(ii) ip is C l on (0, 77 ); 

(iii) for all s G ( 0 , 77 ), <p'(s) > 0; 

(iv) for all x in U C\ {x : f(x*) < f(x) < f(x*) T 77 }, the Kurdyka-Lojasiewicz inequality 
holds 

</(/(t) - / {x*))dist(0, df (x)) > 1. (5.30) 

(b) Proper lower semi-continuous functions which satisfy the Kurdyka-Lojasiewicz inequality 
at each point of dom df are called KL functions. 

Functions satisfying the KL inequality include real analytic functions, semialgebraic functions 
and locally strongly convex functions. 
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